体现了AI已经显示出对模拟中的丰富机器人任务的结果,包括视觉导航和操纵。事先工作通常与最短的路径一起追求高成功率,同时在很大程度上忽略了互动期间碰撞引起的问题。这种缺乏优先级识别是可以理解的:在模拟环境中,不存在破坏虚拟对象的固有成本。因此,尽管最终成功,但训练有素的代理经常具有与对象的灾难性碰撞。在机器人社区中,碰撞成本大,碰撞避免是一项长期的和关键的话题,以确保机器人可以安全地部署在现实世界中。在这项工作中,我们将第一步迈向碰撞/干扰体现AI代理,用于视觉移动操作,促进真正的机器人安全部署。我们在核心开发了一种新的干扰 - 避免方法是扰动预测的辅助任务。当与干扰罚款结合时,我们的辅助任务通过知识蒸馏到代理商的知识蒸馏而大大提高了样本效率和最终性能。我们对Manipulathor的实验表明,在用新型物体的测试场景上,我们的方法将成功率提高了61.7%至85.6%,而且在原始基线的29.8%至50.2%的情况下,成功率没有干扰。广泛的消融研究表明了我们流水线方法的价值。项目网站位于https://sites.google.com/view/disturb-free
translated by 谷歌翻译
Transforming off-the-shelf deep neural network (DNN) models into dynamic multi-exit architectures can achieve inference and transmission efficiency by fragmenting and distributing a large DNN model in edge computing scenarios (e.g., edge devices and cloud servers). In this paper, we propose a novel backdoor attack specifically on the dynamic multi-exit DNN models. Particularly, we inject a backdoor by poisoning one DNN model's shallow hidden layers targeting not this vanilla DNN model but only its dynamically deployed multi-exit architectures. Our backdoored vanilla model behaves normally on performance and cannot be activated even with the correct trigger. However, the backdoor will be activated when the victims acquire this model and transform it into a dynamic multi-exit architecture at their deployment. We conduct extensive experiments to prove the effectiveness of our attack on three structures (ResNet-56, VGG-16, and MobileNet) with four datasets (CIFAR-10, SVHN, GTSRB, and Tiny-ImageNet) and our backdoor is stealthy to evade multiple state-of-the-art backdoor detection or removal methods.
translated by 谷歌翻译
Modern autonomous driving system is characterized as modular tasks in sequential order, i.e., perception, prediction and planning. As sensors and hardware get improved, there is trending popularity to devise a system that can perform a wide diversity of tasks to fulfill higher-level intelligence. Contemporary approaches resort to either deploying standalone models for individual tasks, or designing a multi-task paradigm with separate heads. These might suffer from accumulative error or negative transfer effect. Instead, we argue that a favorable algorithm framework should be devised and optimized in pursuit of the ultimate goal, i.e. planning of the self-driving-car. Oriented at this goal, we revisit the key components within perception and prediction. We analyze each module and prioritize the tasks hierarchically, such that all these tasks contribute to planning (the goal). To this end, we introduce Unified Autonomous Driving (UniAD), the first comprehensive framework up-to-date that incorporates full-stack driving tasks in one network. It is exquisitely devised to leverage advantages of each module, and provide complementary feature abstractions for agent interaction from a global perspective. Tasks are communicated with unified query design to facilitate each other toward planning. We instantiate UniAD on the challenging nuScenes benchmark. With extensive ablations, the effectiveness of using such a philosophy is proven to surpass previous state-of-the-arts by a large margin in all aspects. The full suite of codebase and models would be available to facilitate future research in the community.
translated by 谷歌翻译
To better handle long-tail cases in the sequence labeling (SL) task, in this work, we introduce graph neural networks sequence labeling (GNN-SL), which augments the vanilla SL model output with similar tagging examples retrieved from the whole training set. Since not all the retrieved tagging examples benefit the model prediction, we construct a heterogeneous graph, and leverage graph neural networks (GNNs) to transfer information between the retrieved tagging examples and the input word sequence. The augmented node which aggregates information from neighbors is used to do prediction. This strategy enables the model to directly acquire similar tagging examples and improves the general quality of predictions. We conduct a variety of experiments on three typical sequence labeling tasks: Named Entity Recognition (NER), Part of Speech Tagging (POS), and Chinese Word Segmentation (CWS) to show the significant performance of our GNN-SL. Notably, GNN-SL achieves SOTA results of 96.9 (+0.2) on PKU, 98.3 (+0.4) on CITYU, 98.5 (+0.2) on MSR, and 96.9 (+0.2) on AS for the CWS task, and results comparable to SOTA performances on NER datasets, and POS datasets.
translated by 谷歌翻译
Photo-realistic style transfer aims at migrating the artistic style from an exemplar style image to a content image, producing a result image without spatial distortions or unrealistic artifacts. Impressive results have been achieved by recent deep models. However, deep neural network based methods are too expensive to run in real-time. Meanwhile, bilateral grid based methods are much faster but still contain artifacts like overexposure. In this work, we propose the \textbf{Adaptive ColorMLP (AdaCM)}, an effective and efficient framework for universal photo-realistic style transfer. First, we find the complex non-linear color mapping between input and target domain can be efficiently modeled by a small multi-layer perceptron (ColorMLP) model. Then, in \textbf{AdaCM}, we adopt a CNN encoder to adaptively predict all parameters for the ColorMLP conditioned on each input content and style image pair. Experimental results demonstrate that AdaCM can generate vivid and high-quality stylization results. Meanwhile, our AdaCM is ultrafast and can process a 4K resolution image in 6ms on one V100 GPU.
translated by 谷歌翻译
3D人类的姿势和形状估计(又称“人网恢复”)取得了实质性进展。研究人员主要关注新算法的发展,而对涉及的其他关键因素的关注较少。这可能会导致最佳基线,从而阻碍对新设计方法的公平和忠实的评估。为了解决这个问题,这项工作从算法以外的三个探索性观点中提出了首次全面的基准测试研究。 1)数据集。对31个数据集的分析揭示了数据样本的不同影响:具有关键属性的数据集(即多样化的姿势,形状,相机特征,骨干特征)更有效。高质量数据集的战略选择和组合可以显着提高模型性能。 2)骨干。从CNN到变压器的10个骨干的实验表明,从接近任务中学到的知识很容易转移到人网状恢复中。 3)培训策略。正确的增强技术和损失设计至关重要。通过上述发现,我们在具有相对简单的模型的3DPW测试集上实现了47.3 mm的PA-MPJPE。更重要的是,我们为算法的公平比较提供了强大的基准,以及将来建立有效培训配置的建议。代码库可在http://github.com/smplbody/hmr-benchmarks上获得
translated by 谷歌翻译
训练视觉和语言模型的更多数据总是更好吗?我们研究多模式任务中的知识可传递性。当前的机器学习趋势是假设通过从不同任务加入多个数据集,其整体绩效将有所改善。但是,我们表明,并非所有知识都会很好地转移或对相关任务产生积极影响,即使它们共享一个共同的目标也是如此。我们基于数百种分为4组的视觉和语言任务进行了数百个跨表现的分析。尽管同一组中的任务容易相互改进,但结果表明并非总是如此。其他因素(例如数据集大小或训练阶段)也对知识的转移程度也有很大的影响。
translated by 谷歌翻译
点击率(CTR)预测的目标是预测用户单击项目的可能性,在推荐系统中变得越来越重要。最近,一些具有自动从他/她的行为中提取用户兴趣的深度学习模型取得了巨大的成功。在这些工作中,注意机制用于选择用户在历史行为中感兴趣的项目,从而提高CTR预测指标的性能。通常,这些细心的模块可以通过使用梯度下降与基本预测变量共同训练。在本文中,我们将用户兴趣建模视为特征选择问题,我们称之为用户兴趣选择。对于这样一个问题,我们在包装法的框架下提出了一种新颖的方法,该方法被称为Meta-wrapper。更具体地说,我们使用可区分的模块作为包装运算符,然后将其学习问题重新提出为连续的二元优化。此外,我们使用元学习算法来求解优化并理论上证明其收敛性。同时,我们还提供了理论分析,以表明我们提出的方法1)效率基于包装器的特征选择,而2)可以更好地抵抗过度拟合。最后,在三个公共数据集上进行的广泛实验表明了我们方法在提高CTR预测的性能方面的优势。
translated by 谷歌翻译
In this paper, we propose a unified whole-body control framework for velocity-controlled mobile collaborative robots which can distribute task motion into the arm and mobile base according to specific task requirements by adjusting weighting factors. Our framework focuses on addressing two challenging issues in whole-body coordination: 1) different dynamic characteristics of the mobile base and the arm; 2) avoidance of violating both safety and configuration constraints. In addition, our controller involves Coupling Dynamic Movement Primitives to enable the essential capabilities for collaboration and interaction applications, such as obstacle avoidance, human teaching, and compliance control. Based on these, we design an adaptive motion mode for intuitive physical human-robot interaction through adjusting the weighting factors. The proposed controller is in closed-form and thus quite computationally efficient. Several typical experiments carried out on a real mobile collaborative robot validate the effectiveness of the proposed controller.
translated by 谷歌翻译
强大的语义细分面临的一个普遍挑战是昂贵的数据注释成本。现有的半监督解决方案显示出解决此问题的巨大潜力。他们的关键想法是通过未经监督的数据增加未标记的数据来构建一致性正则化,以进行模型培训。未标记数据的扰动使一致性训练损失使半监督的语义分割受益。但是,这些扰动破坏了图像上下文并引入了不自然的边界,这对语义分割是有害的。此外,广泛采用的半监督学习框架,即均值老师,遭受了绩效限制,因为学生模型最终会收敛于教师模型。在本文中,首先,我们提出了一个友好的可区分几何扭曲,以进行无监督的数据增强。其次,提出了一个新颖的对抗双重学生框架,以从以下两个方面从以下两个方面改善均等老师:(1)双重学生模型是独立学习的,除了稳定约束以鼓励利用模型多样性; (2)对对抗性训练计划适用于学生,并诉诸歧视者以区分无标记数据的可靠伪标签进行自我训练。通过对Pascal VOC2012和CityScapes进行的广泛实验来验证有效性。我们的解决方案可显着提高两个数据集的性能和最先进的结果。值得注意的是,与完全监督相比,我们的解决方案仅使用Pascal VOC2012上的12.5%注释数据获得了73.4%的可比MIOU。我们的代码和模型可在https://github.com/caocong/ads-semiseg上找到。
translated by 谷歌翻译